Skip to content

Conversation

@NathanHB
Copy link
Member

@NathanHB NathanHB commented Oct 31, 2025

to run:

lighteval endpoint inference-providers "model_name=openai/gpt-oss-20b,provider=hyperbolic,generation_parameters={max_new_tokens:8192}" "lighteval|mmlu_pro|0" --save-details

@NathanHB NathanHB requested a review from Copilot October 31, 2025 13:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the MMLU Pro benchmark, a multiple-choice question answering task from the TIGER-Lab/MMLU-Pro dataset.

  • Introduces a new MMLU Pro task configuration
  • Implements a custom prompt function for MMLU Pro questions
  • Configures evaluation on the test split with validation for few-shots
Comments suppressed due to low confidence (8)

src/lighteval/tasks/tasks/mmlu_pro.py:74

  • The task configuration is missing the generation_size parameter, which is required for generative metrics like gpqa_instruct_metric. Based on similar tasks using this metric (e.g., gpqa.py lines 57, 73, 89), a value like generation_size=30 or generation_size=32768 should be specified depending on whether reasoning traces are expected.
    src/lighteval/tasks/tasks/mmlu_pro.py:74
  • The task configuration is missing the stop_sequence parameter. Based on the generative nature of the task and similar configurations (e.g., gpqa.py lines 59, 75, 91), stop_sequence=[] should be explicitly set to use the EOS token.
    src/lighteval/tasks/tasks/mmlu_pro.py:23
  • Import of 'LogLikelihoodAccMetric' is not used.
https://arxiv.org/abs/2406.01574
"""
from string import ascii_uppercase

src/lighteval/tasks/tasks/mmlu_pro.py:25

  • Import of 'LogProbCharNorm' is not used.
    Import of 'LogProbPMINorm' is not used.
    Import of 'LogProbTokenNorm' is not used.
from lighteval.metrics.metrics import Metrics

src/lighteval/tasks/tasks/mmlu_pro.py:27

  • Import of 'get_metrics_for_formulation' is not used.
from lighteval.tasks.requests import Doc

src/lighteval/tasks/tasks/mmlu_pro.py:29

  • Import of 'get_mcq_prompt_function' is not used.
    src/lighteval/tasks/tasks/mmlu_pro.py:34
  • Import of 'CFFormulation' is not used.
    Import of 'HybridFormulation' is not used.
    Import of 'MCFFormulation' is not used.
TEMPLATE = """
Answer the following multiple choice question. The last line of your response should be of the following format: 'Answer: $LETTER' (without quotes) where LETTER is one of ABCD. Think step by step before answering.

{question}

src/lighteval/tasks/tasks/mmlu_pro.py:35

  • Import of 'Language' is not used.
{choices}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@NathanHB NathanHB merged commit fa4860f into main Nov 4, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants